9 research outputs found

    Machine Learning Guided Exploration of an Empirical Ribozyme Fitness Landscape

    Get PDF
    Okinawa Institute of Science and Technology Graduate UniversityDoctor of PhilosophyFitness landscape of a biomolecule is a representation of its activity as a function of its sequence. Properties of a fitness landscape determine how evolution proceeds. Therefore, the distribution of functional variants and more importantly, the connectivity of these variants within the sequence space are important scientific questions. Exploration of these spaces, however, is impeded by the combinatorial explosion of the sequence space. High-throughput experimental methods have recently reduced this impediment but only modestly. Better computational methods are needed to fully utilize the rich information from these experimental data to better understand the properties of the fitness landscape. In this work, I seek to improve this exploration process by combining data from massively parallel experimental assay with smart library design using advanced computational techniques. I focus on an artificial RNA enzyme or ribozyme that can catalyze a ligation reaction between two RNA fragments. This chemistry is analogous to that of the modern RNA polymeraseenzymes, therefore, represents an important reaction in the origin of life. In the first chapter, I discuss the background to this work in the context of evolutionary theory of fitness landscape and its implications in biotechnology. In chapter 2, I explore the use of processes borrowed from the field of evolutionary computation to solve optimization problems using real experimental sequence-activity data. In chapter 3, I investigate the use of supervised machine learning models to extract information on epistatic interactions from the dataset collected during multiple rounds of directed evolution. I investigate and experimentally validate the extent to which a deep learning model can be used to guide a completely computational evolutionary algorithm towards distant regions of the fitness landscape. In the final chapter, I perform a comprehensive experimental assay of the combinatorial region explored by the deep learning-guided evolutionary algorithm. Using this dataset, I analyze higher-order epistasis and attempt to explain the increased predictability of the region sampled by the algorithm. Finally, I provide the first experimental evidence of a large RNA ‘neutral network’. Altogether, this work represents the most comprehensive experimental and computational study of the RNA ligase ribozyme fitness landscape to date, providing important insights into the evolutionary search space possibly explored during the earliest stages of life.doctoral thesi

    Experimental exploration of a ribozyme neutral network using evolutionary algorithm and deep learning

    Get PDF
    A neutral network connects all genotypes with equivalent phenotypes in a fitness landscape and plays an important role in the mutational robustness and evolvability of biomolecules. In contrast to earlier theoretical works, evidence of large neutral networks has been lacking in recent experimental studies of fitness landscapes. This suggests that evolution could be constrained globally. Here, we demonstrate that a deep learning-guided evolutionary algorithm can efficiently identify neutral genotypes within the sequence space of an RNA ligase ribozyme. Furthermore, we measure the activities of all 216 variants connecting two active ribozymes that differ by 16 mutations and analyze mutational interactions (epistasis) up to the 16th order. We discover an extensive network of neutral paths linking the two genotypes and reveal that these paths might be predicted using only information from lower-order interactions. Our experimental evaluation of over 120,000 ribozyme sequences provides important empirical evidence that neutral networks can increase the accessibility and predictability of the fitness landscape

    Analysis of the Sequence Preference of Saporin by Deep Sequencing

    Get PDF
    Ribosome-inactivating proteins (RIPs) are RNA:adenosine glycosidases that inactivate eukaryotic ribosomes by depurinating the sarcin-ricin loop (SRL) in 28S rRNA. The GAGA sequence at the top of the SRL or at the top of a hairpin loop is assumed to be their target motif. Saporin is a RIP widely used to develop immunotoxins for research and medical applications, but its sequence specificity has not been investigated. Here, we combine the conventional aniline cleavage assay for depurinated nucleic acids with high-throughput sequencing to study sequence-specific depurination of oligonucleotides caused by saporin. Our data reveal the sequence preference of saporin for different substrates and show that the GAGA motif is not efficiently targeted by this protein, neither in RNA nor in DNA. Instead, a preference of saporin for certain hairpin DNAs was observed. The observed sequence-specific activity of saporin may be relevant to antiviral or apoptosis-inducing effects of RIPs. The developed method could also be useful for studying the sequence specificity of depurination by other RIPs or enzymes

    Rheological Scaling of Ionic-Liquid-Based Polyelectrolytes in Ionic Liquid Solutions

    Get PDF
    Polymerized ionic liquids (PILs) are a special class of polyelectrolytes with ionic liquid (IL) species being covalently attached to the repeating unit. The rheological properties of PIL in IL solutions are strongly influenced by the electrostatic screening between IL and PIL chains. However, the effect of IL electrostatic screening on the rheology of PIL in IL solutions remains elusive. To address this challenging yet important question, we conduct detailed rheological characterization of a model system containing a PIL [PC4-TFSI: poly(1-butyl-3-vinylimidazolium bis(trifluoromethanesulfonyl)imide)] in a mixture of a salt-free solvent (DMF: dimethylformamide) and an IL [Bmim-TFSI: 1-butyl-3-methylimidazolium bis(trifluoromethanesulfonyl)imide] solution, with low to high IL concentrations, while spanning dilute and semidilute polymer regimes. We compare the specific viscosity ηsp and the longest relaxation time λ of PILs measured at various Bmim-TFSI concentrations from 0 M (pure DMF) to 3.42 M (pure Bmim-TFSI) with the scaling predictions for ordinary polyelectrolyte solutions. We find good agreement at low IL concentrations but significant deviations at higher IL concentrations. We capture this discrepancy by proposing and validating a modified scaling law accounting for the modified screening length in concentrated solutions of ordinary salts. We propose that extended PIL chains initially shrink due to the charge screening effect at low IL concentrations but revert to expanded configuration at higher IL concentrations due to the charge underscreening effect when the screening length increases with increasing IL concentrations. Our results shed new insights on the conformation of PIL in IL solutions and, for the first time, provide a valid reference for the study of general polyelectrolyte solutions at high salt concentrations, where the Debye–Hückel theory is no longer valid
    corecore